Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples

نویسندگان

  • Philippe Chouvarine
  • Lutz Wiehlmann
  • Patricia Moran Losada
  • David S. DeLuca
  • Burkhard Tümmler
چکیده

Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the "universal" 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the developmen...

متن کامل

Exploration and retrieval of whole-metagenome sequencing samples

MOTIVATION Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the...

متن کامل

MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data

There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have ...

متن کامل

PALADIN: protein alignment for functional profiling whole metagenome shotgun data

Motivation Whole metagenome shotgun sequencing is a powerful approach for assaying the functional potential of microbial communities. We currently lack tools that efficiently and accurately align DNA reads against protein references, the technique necessary for constructing a functional profile. Here, we present PALADIN-a novel modification of the Burrows-Wheeler Aligner that provides accurate ...

متن کامل

Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2016